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Abstract 

Feature descriptors play a crucial role in a wide range 
of geometry analysis and processing applications, includ¬ 
ing shape correspondence, retrieval, and segmentation. In 
this paper, we introduce Geodesic Convolutional Neural 
Networks (GCNN), a generalization of the convolutional net¬ 
works (CNN) paradigm to non-Euclidean manifolds. Our 
construction is based on a local geodesic system of polar 
coordinates to extract ''patches”, which are then passed 
through a cascade of filters and linear and non-linear oper¬ 
ators. The coefficients of the filters and linear combination 
weights are optimization variables that are learned to min¬ 
imize a task-specific cost function. We use GCNN to learn 
invariant shape features, allowing to achieve state-of-the-art 
performance in problems such as shape description, retrieval, 
and correspondence. 

1. Introduction 

Feature descriptors are ubiquitous tools in shape analysis. 
Broadly speaking, a local feature descriptor assigns to each 
point on the shape a vector in some multi-dimensional de¬ 
scriptor space representing the local structure of the shape 
around that point. A global descriptor describes the whole 
shape. Local feature descriptors are used in higher-level 
tasks such as establishing correspondence between shapes 
[35], shape retrieval [8], or segmentation [43]. Global de¬ 
scriptors are often produced by aggregating local descriptors 
e.g. using the bag-of-features paradigm. Descriptor construc¬ 
tion is largely application dependent, and one typically tries 
to make the descriptor discriminative (capture the structures 
that are important for a particular application, e.g. telling 
apart two classes of shapes), robust (invariant to some class 
of transformations or noise), compact (low dimensional), 
and computationally-efficient. 

Previous work Early works on shape descriptors such 
as spin images [19], shape distributions [34], and integral 
volume descriptors [32] were based on extrinsic structures 
that are invariant under Euclidean transformations. The fol¬ 
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lowing generation of shape descriptors used intrinsic struc¬ 
tures such as geodesic distances [15] that are preserved by 
isometric deformations. The success of image descriptors 
such as SIET [31], HOG [13], MSER [33], and shape con¬ 
text [2] has led to several generalizations thereof to non- 
Euclidean domains (see e.g. [49, 14, 24], respectively). The 
works [11, 28] on diffusion and spectral geometry have led 
to the emergence of intrinsic spectral shape descriptors that 
are dense and isometry-invariant by construction. Notable 
examples in this family include heat kernel signatures (HKS) 
[45] and wave kernel signatures (WKS) [1]. 

Arguing that in many cases it is hard to model invariance 
but rather easy to create examples of similar and dissimilar 
shapes, Litman and Bronstein [29] showed that HKS and 
WKS can be considered as particular parametric families 
of transfer functions applied to the Laplace-Beltrami oper¬ 
ator eigenvalues and proposed to learn an optimal transfer 
function. Their work follows the recent trends in the image 
analysis domain, where hand-crafted descriptors are aban¬ 
doned in favor of learning approaches. The past decade in 
computer vision research has witnessed the re-emergence 
of “deep learning” and in particular, convolutional neural 
network (CNN) techniques [ /, 27], allowing to learn task- 
specific features from examples. CNNs achieve a break¬ 
through in performance in a wide range of applications such 
as image classification [26], segmentation [10], detection 
and localization [38, 42] and annotation [16, 21]. 

Learning methods have only recently started penetrating 
into the 3D shape analysis community in problems such as 
shape correspondence [39, 37], similarity [20], description 
[29, 47, 12], and retrieval [30]. CNNs have been applied 
to 3D data in the very recent works [48, 44] using standard 
(Euclidean) CNN architectures applied to volumetric 2D 
views shape representations, making them unsuitable for 
deformable shapes. Intrinsic versions of CNNs that would 
allows dealing with shape deformations are difficult to for¬ 
mulate due to the lack of shift invariance on Riemannian 
manifolds; we are aware of two recent works in that direc¬ 
tion [9, 5]. 

Contribution In this paper, we propose Geodesic CNN 
(GCNN), an extension of the CNN paradigm to non- 
Euclidean manifolds based on local geodesic system of coor- 
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dinates that are analogous to ‘patches’ in images. Compared 
to previous works on non-Euclidean CNNs [9, 5], our model 
is generalizable (i.e., it can be trained on one set of shapes 
and then applied to another one), local, and allows to capture 
anisotropic structures. We show that HKS [45], WKS [1], 
optimal spectral descriptors [29], and intrinsic shape context 
[24] can be obtained as particular configurations of GCNN; 
therefore, our approach is a generalization of previous pop¬ 
ular descriptors. Our experimental results show that our 
model can be applied to achieve state-of-the-art performance 
in a wide range of problems, including the construction of 
shape descriptors, retrieval, and correspondence. 


Discretization In the discrete setting, the surface X is sam¬ 
pled at N points xi,..., xat. On these points, we construct 
a triangular mesh (V, F) with vertices V = A^}, 

in which each interior edge ij e E is shared by exactly 
two triangular faces ikj and jhi G F, and boundary edges 
belong to exactly one triangular face. The set of vertices 
{ j G y : ij G E} directly connected to i is called the 1-ring 
of i. A real-valued function f: X —> M on the surface is 
sampled on the vertices of the mesh and can be identified 
with an A^-dimensional vector f = (/(xi),...,/(xiv))^. 
The discrete version of the LBO is given as an x matrix 
L = A“^W, where 


2. Background 

We model a 3D shape as a connected smooth compact 
two-dimensional manifold (surface) X, possibly with a 
boundary dX. Locally around each point x the manifold is 
homeomorphic to a two-dimensional Euclidean space re¬ 
ferred to as the tangent plane and denoted by A 

Riemannian metric is an inner product (•, •)t^x • T^X x 
TxX ^ M on the tangent space depending smoothly on x. 

Laplace-Beltrami operator (LBO) is a positive semidef- 
inite operator Ax/ = —div(V/), generalizing the classical 
Laplacian to non-Euclidean spaces. The LBO is intrinsic, 
i.e., expressible entirely in terms of the Riemannian metric. 
As a result, it is invariant to isometric (metric-preserving) 
deformations of the manifold. On a compact manifold, the 
LBO admits an eigendecomposition Axcjk = with 
real eigenvalues 0 = Ai < A 2 < - The correspond¬ 

ing eigenfunctions , ^2 , • • • form an orthonormal basis on 
L‘^{X), which is a generalization of the Eourier basis to 
non-Euclidean domains. 

Heat diffusion on manifolds is governed by the diffusion 
equation. 


(Ax + ^) = 0; u{x,Qi) = uo{x), (1) 

where u{x,t) denotes the amount of heat at point x at time 
t, uo{x) is the initial heat distribution; if the manifold has a 
boundary, appropriate boundary conditions must be added. 
The solution of (1) is expressed in the spectral domain as 


(cot aij -h cot /3ij)l2 
0 


ij G F; 

^ = i; 

else; 
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aij, Pij denote the angles /iikj, Zjhi of the triangles shar¬ 
ing the edge ij, and A = diag(ai,..., ax) with a^ = 
I ^jk-ijkeF being the local area element at vertex i 
and Aijk denoting the area of triangle ijk [36]. 

The first K < N eigenfunctions and eigenvalues of the 
LBO are computed by performing the generalized eigen¬ 
decomposition = A^A, where ^ = (0^,..., 
is an X X FT matrix containing as columns the discretized 
eigenfunctions and A = diag(Ai,..., Ax) is the diagonal 
matrix of the corresponding eigenvalues. 


3. Spectral descriptors 

Many popular spectral shape descriptors are constructed 
taking the diagonal values of heat-like operators. A generic 
descriptor of this kind has the form 


K 

k>l k=l 

where t(A) = (ti(A), ... ,rQ(A))^ is a bank of transfer 
functions acting on LBO eigenvalues, and Q is the descriptor 
dimensionality. Such descriptors are dense (computed at 
every point x), intrinsic by construction, and typically can 
be efficiently computed using a small number K of LBO 
eigenfunctions and eigenvalues. 


u{x,t) = / uo{xjY^ *^’‘4’k{x)4>k{x')dx', (2) 

k>i 

'-V-' 

ht(x,x') 

where ht{x,x') is the heat kernel. Interpreting the LBO 
eigenvalues as ‘frequencies’, the coefficients play the 
role of a transfer function corresponding to a low-pass filter 
sampled at {Afc}fe>i. 


Heat kernel signature (HKS) [45] is a particular setting 
of (4) using parametric low-pass filters of the form rt(A) = 
which allows to interpret them as diagonal values of 
the heat kernel taken at some times ti,..., tg. The physical 
interpretation of the HKS is autodiffusivity, i.e., the amount 
of heat remaining at point x after time t, which is equal (up 
to constant) to the Gaussian curvature for small t. A notable 
drawback of HKS stemming from the use of low-pass filters 
is its poor spatial localization. 




Wave kernel signature (HKS) [1] arises from the model 
of a quantum particle on the manifold possessing some 
initial energy distribution, and boils down to a particular 
setting of (4) with band-pass filters of the form Ty{\) = 

exp ^ ^ ) ’ where u is the initial mean energy of the 

particle. WKS have better localization, but at the same time 
tend to produce noisier matches. 

Optimal spectral descriptors (OSD) [29] use parametric 
transfer functions expressed as 

M 

^ ^ ( 5 ) 

m=l 

in the B-spline basis Pi{X),... ,/3m(A), where aqm (q = 

= 1,..., M) are the parametrization coeffi¬ 
cients. Plugging (5) into (4) one can express the gth compo¬ 
nent of the spectral descriptor as 

M 

fq{^) = X] = y] aqm XI (6) 

k>l m=l k>l 

^ ^ ^ 

9m (^) 

where g(x) = (gi(x ),is a vector-valued func¬ 
tion referred to as geometry vector, dependent only on the 
intrinsic geometry of the shape. Thus, (4) is parametrized by 
the Q X M matrix A = {aim) and can be written in matrix 
form as ^{x) = Ag{x). The main idea of [29] is to learn 
the optimal parameters A by minimizing a task-specific loss 
which reduces to a Mahalanobis-type metric learning. 

4. Convolutional neural networks on manifolds 
4.1. Geodesic convolution 

We introduce a notion of convolution on non-Euclidean 
domains that follows the ‘correlation with template’ idea by 
employing a local system of geodesic polar coordinates con¬ 
structed at point X, shown in Figure 1, to extract patches on 
the manifold. The radial coordinate is constructed as p-level 
sets {x' : dx{x,x') = p} of the geodesic (shortest path) 
distance function for p G [0, po]; we call po the radius of the 
geodesic disc. ^ Empirically, we see that choosing a suffi¬ 
ciently small Po ~ 1% of the geodesic diameter of the shape 
produces valid topological discs. The angular coordinate is 
constructed as a set of geodesics r{x,0) emanating from x 
in direction 0; such rays are perpendicular to the geodesic 
distance level sets. Note that the choice of the origin of the 
angular coordinate is arbitrary. For boundary points, the pro¬ 
cedure is very similar, with the only difference that instead 
of mapping into a disc we map into a half-disc. 

^ If the radius po of the geodesic ball Bp^ (x) = {x' : dx {x^x') < 
po } is sufficiently small w.r.t the local convexity radius of the manifold, 
then the resulting ball is guaranteed to be a topological disc. 



Figure 1: Construction of local geodesic polar coordinates 
on a manifold. Feft: examples of local geodesic patches, 
center and right: example of angular and radial weights vq, 
Vp, respectively (red denotes larger weights). 


Fet ft{x): Bp^{x) [0,po] x [0, 27r) denote the bi- 
jective map from the manifold into the local geodesic po¬ 
lar coordinates (p, 6>) around x, and let {D{x)f){p^0) = 
(/ o (2“^(x))(p, 0) be the patch operator interpolating / in 
the local coordinates. We can regard D{x)f as a ‘patch’ on 
the manifold and use it to define what we term the geodesic 
convolution (GC), 

(/ * a){a;) = X a(6' + A6», r){D{x)f){r, 9), (7) 

0,r 

where a(6>, r) is a filter applied on the patch. Due to angular 
coordinate ambiguity, the filter can be rotated by arbitrary 
angle AO. 

Patch operator Kokkinos et al. [24] construct the patch 
operator as 

{D{x)f){p,9) = 

Vp^0{x,x') = 

The radial interpolation weight is a Gaussian Vp{x^x') oc 
^-{dx{x,x )-p) jcTp geodesic distance from x, centered 
around p (see Figure 1, right). The angular weight is a 
Gaussian x') oc of the point-to-set 

distance dx(^{x,0),x') = mmxff^Y{x,e) dx{x"^x') to the 
geodesic r{x,0) (see Figure 1, center). 

Discrete patch operator On triangular meshes, a discrete 
local system of geodesic polar coordinates has Nq angular 
and Np radial bins. Starting with a vertex i, we first partition 
the 1-ring of i by Nq rays into equi-angular bins, aligning the 
first ray with one of the edges (Figure 2). Next, we propagate 
the rays into adjacent triangles using an unfolding procedure 
resembling one used in [23], producing poly-lines that form 
the angular bins (see Figure 2). Radial bins are created as 
level sets of the geodesic distance function computed using 
fast marching [23]. 


/ Vp^0{x,x')f{x')dx', 

Jx 

Vp(x,x')vg{x,x') 
Vp(x, x')vq{x, x')dx' 


( 8 ) 

(9) 









Figure 2: Construction of local geodesic polar coordinates 
on a triangular mesh. Shown clock-wise: division of 1-ring 
of vertex Xi into Nq equi-angular bins; propagation of a ray 
(bold line) by unfolding the respective triangles (marked in 
green). 

We represent the discrete patch operator as an NgNpN x 
N matrix applied to a function defined on the mesh vertices 
and producing the patches at each vertex. The matrix is 
very sparse since the values of the function at a few nearby 
vertices only contribute to each local geodesic polar bin. 

4.2. Geodesic Convolutional Neural Networks 

Using the notion of geodesic convolution, we are now 
ready to extend CNNs to manifolds. GCNN consists of 
several layers that are applied subsequently, i.e. the output 
of the previous layer is used as the input into the subsequent 
one (see Figure 3). We distinguish between the following 
types of layers: 


where aA9,qp{0,r) = aqp{0 A0,r) are the coefficients 
of the pth filter in the gth filter bank rotated by AO = 
0, ..., , and the convolution is understood in 

the sense of (7). 


Angular max-pooling (AMP) is a fixed layer used in con¬ 
junction with the GC layer, that computes the maximum over 
the filter rotations, 

= max/Se p(a:), p=l,...,P = Q, (12) 

where ^ is the output of the GC layer (11). 


Fourier transform magnitude (FTM) layer is another 
fixed layer that applies the patch operator to each input di¬ 
mension, followed by Fourier transform w.r.t. the angular 
coordinate and absolute value. 
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(13) 


p = l,...,P = (5. The Fourier transform translates rota¬ 
tional ambiguity into complex phase ambiguity, which is 
removed by taking the absolute value [25, 24]. 


Covariance (COV) layer is used in applications such as 
retrieval where one needs to aggregate the point-wise de¬ 
scriptors and produce a global shape descriptor [46], 

f«= f {^{x) - n){&{x) - p,pdx, (14) 
Jx 

where r“(x) = (/{^(x),..., /p(x))^ is a P-dimensional 
input vector, = fx f^^(x)dx, and f°^^ is a P x P matrix 
column-stacked into a P^-dimensional vector. 


Linear (LIN) layer typically follows the input layer and 
precedes the output layer to adjust the input and output 
dimensions by means of a linear combination, 

fp(x) = ^ ^J2wgpf^”(x)'j ; q = l,...,Q, (10) 

optionally followed by a non-linear function such as the 
ReLU, ^(t) = max{0,t}. 

Geodesic convolution (GC) layer replaces the convolu¬ 
tional layer used in classical Euclidean CNNs. Due to the 
angular coordinate ambiguity, we compute the geodesic con¬ 
volution result for all Nq rotations of the filters, 

p 

fAe,g{^) = '^{fp*<^Ae,qp){x), q = l,...,Q, (11) 

P=1 


5. Comparison to previous approaches 

Our approach is perhaps the most natural way of general¬ 
izing CNNs to manifolds, where convolutions are performed 
by sliding a window over the manifold, and local geodesic co¬ 
ordinates are used in place of image ‘patches’. Such patches 
allow capturing local anisotropic structures. Our method is 
generalizable and unlike spectral approaches does not rely 
on the approximate invariance of Laplacian eigenfunctions 
across the shapes. 

Spectral descriptors can be obtained as particular config¬ 
urations of GCNN applied on geometry vectors input. HKS 
[45] and WKS [ ] descriptors are obtained by using a fixed 
LIN layer configured to produce low- or band-pass filters, 
respectively. OSD [29] is obtained by using a learnable 
LIN layer. Intrinsic shape context [24] is obtained by us¬ 
ing a fixed LIN layer configured to produce HKS or WKS 
descriptors, followed by a fixed FTM layer. 













filter bank 1 
P filters 



Input M-dim LIN ReLU GC AMP Output Q-dim 


Figure 3: The simple GCNNl architecture containing one convolutional layer applied to M = 150-dimensional geometry 
vectors (input layer) of a human shape, to produce a Q = 16-dimensional feature descriptor (output layer). 


Spectral nets [9] are a spectral formulation of CNNs using 
the notion of generalized (non shift-invariant) convolution 
that relies on the analogy between the classical Fourier trans¬ 
form and the Laplace-Beltrami eigenbasis, and the fact that 
the convolution operator is diagonalized by the Fourier trans¬ 
form. The main drawback of this approach is that while it 
allows to extend CNNs to a non-Euclidean domain (in partic¬ 
ular, the authors considered a graph), it does not generalize 
across different domains; the convolution coefficients are 
expressed in a domain-dependent basis. Another drawback 
of spectral nets is that they do not use locality. 

Localized spectral nets [5] are an extension of [9] using 
the Windowed Fourier transform (WFT) [40] on manifolds. 
Due to localization, this method has better generalization 
abilities, however, it might have problems in the case of 
strongly non-isometric deformations due to the variability of 
the Laplacian eigenfunctions. Furthermore, while WFT al¬ 
lows capturing local structures, it is isotropic, i.e., insensitive 
to orientations. 

6. Applications 

GCNN model can be thought of as a non-linear 
hierarchical parametric function i/j@(F), where F = 
(f(a:i),.. .,i{xM)) isaP X N matrix of input features (such 
as HKS, WKS, geometry vectors, or anything else) at all 
the points of the mesh, and 0 denotes the parameters of 
all the layers. Depending on the application in hand, these 
parameters are learned by minimizing some loss function. 
We describe three examples of such task-specific losses. 

Invariant descriptors Applying the GCNN point-wise on 
some input feature vector f(x), the output can be 

regarded as a dense local descriptor at point x. Our goal is 
to make the output of the network as similar as possible at 


corresponding points (positives) across a collection of shapes, 
and as dissimilar as possible at non-corresponding points 
(negatives). For this purpose, we use a Siamese network 
configuration [6, 18, 41], composed of two identical copies 
of the same GCNN model sharing the same parameterization 
and fed by pairs of knowingly similar or dissimilar samples, 
and minimize the Siamese loss 

\r+\ 

£{&) = (l-7)y]ll'0©(fi)-'0©(f^)ll^ (15) 

i=l 

IT-I 

+ iiV’®(fi) - V’©(dii)L 

i=l 

where 7 G [0,1] is a parameter trading off between the pos¬ 
itive and negative losses, /i is a margin, (t)+ = max{ 0 , t}, 
and 7 ± = {(f^, f^"^)} denotes the sets of positive and nega¬ 
tive pairs, respectively. 

Shape correspondence Finding the correspondence in a 
collection of shapes can be posed as a labelling problem, 
where one tries to label each vertex of a given query shape 
X with the index of a corresponding point on some reference 
shape Y [37]. Let ,..., be the vertices of Y, and let 
Pj. denote the vertex corresponding to Xi for i = 1,..., A^. 
GCNN applied point-wise on X is used to produce an N'- 
dimensional vector encoding the probability distribution on 
Y, which acts as a ‘soft correspondence’. The multinomial 
regression loss 

\T\ 

4 ®) = (16) 

i=l 

is minimized on a training set of known correspondence 
T = {f (^i) , jij to achieve the optimal correspondence (here 
ei is a unit vector with a one at index i). 





























Shape retrieval In the shape retrieval application, we are 
interested in producing a global shape descriptor that dis¬ 
criminates between shape classes (note that in a sense this 
is the converse of invariant descriptors for correspondence, 
which we wanted to be oblivious to different classes). In 
order to aggregate the local features we use the COV layer 
in GCNN and regard '00(F) as a global shape descriptor. 
Training is done by minimizing the Siamese loss, where 
positives and negatives are shapes from same and different 
classes, respectively. 

7. Results 

We used the FAUST [4] dataset containing scanned hu¬ 
man shapes in different poses and the TOSCA [7] dataset 
containing synthetic models of humans in a variety of near¬ 
isometric deformations. The meshes in TOSCA were re¬ 
sampled to lOK vertices; FAUST shapes contained 6.8K 
points. All shapes were scaled to unit geodesic diameter. 
GCNN was implemented in Theano [3]. Geodesic patches 
were generated using the code and settings of [24] with 
po = 1% geodesic diameter. Training was performed us¬ 
ing the Adadelta stochastic optimization algorithm [50] for a 
maximum of 2.5K updates. Typical training times on FAUST 
shapes were approximately 30 and 50 minutes for one- and 
two-layer models (GCNNl and GCNN2, respectively). The 
application of a trained GCNN model to compute feature 
descriptors was very efficient: 75K and 45K vertices/sec for 
the GCNNl and GCNN2 models, respectively. Training and 
testing was done on disjoint sets. As the input to GCNN, 
we used M = 150-dimensional geometry vectors computed 
according to (5)-(6) using B-spline bases. Laplace-Beltrami 
operators were discretized using the cotangent formula (3); 
K = 300 eigenfunctions were computed using MATLAB 
eigs function. 

7.1. Intrinsic shape descriptors 

We first used GCNN to produce dense intrinsic pose- 
and subject-invariant descriptors for human shapes, follow¬ 
ing nearly-verbatim the experimental setup of [29]. For 
reference, we compared GCNN to HKS [45], WKS [1], 
and OSD [29] using the code and settings provided by 
the respective authors. All the descriptors were Q = 16- 
dimensional as in [29]. We used two configurations: GCNNl 
(150-dim input, LIN16-i-ReLU, GC16-1-AMP shown in Fig¬ 
ure 3), and GCNN2 (same as GCNNl with additional ReLU, 
FTM, LIN 16 layers); Training of GCNN was done using the 
loss (15) with positive and negative sets of vertex pairs gen¬ 
erated on the fly. On the FAUST dataset, we used subjects 
1-7 for training, subject 8 for validation, and subject 9-10 
for testing. On TOSCA, we test on all the deformations of 
the Victoria shape. 

Figure 4 depicts the Euclidean distance in the descriptor 
space between the descriptor at a selected point and the rest 


of the points on the same shape as well as its transformations. 
GCNN descriptors manifest both good localization (better 
than HKS) and are more discriminative (less spurious min¬ 
ima than WKS and OSD), as well as robustness to different 
kinds of noise, including isometric and non-isometric defor¬ 
mations, geometric and topological noise, different sampling, 
and missing parts. 

Quantitative descriptor evaluation was done using three 
criteria: cumulative match characteristic (CMC), receiver 
operator characteristic (ROC), and the Princeton protocol 
[22]. The CMC evaluates the probability of a correct corre¬ 
spondence among the k nearest neighbors in the descriptor 
space. The ROC measures the percentage of positives and 
negatives pairs falling below various thresholds of their dis¬ 
tance in the descriptor space (true positive and negative 
rates, respectively). The Princeton protocol counting the 
percentage of nearest-neighbor matches that are at most r- 
geodesically distant from the groundtruth correspondence. 
Figure 5 (first row) shows the performance of different de¬ 
scriptors on the FAUST dataset. We observe that GCNN 
descriptors significantly outperform other descriptors, and 
that the more complex model (GCNN2) further boosts per¬ 
formance. In order to test the generalization capability of 
the learned descriptors, we applied OSD and GCNN learned 
on the FAUST dataset to TOSCA shapes (Figure 5, second 
row). We see that the learned model transfers well to a new 
dataset. 

7.2. Shape correspondence 

To show the application of GCNN for computing intrinsic 
correspondence, we reproduced the experiment of Rodola et 
al. [37] on the FAUST dataset, replacing their random forest 
with a GCNN architecture GCNN3 containing three convo¬ 
lutional layers (input: 150-dimensional geometry vectors, 
LIN16-FReLU, GC32-FAMP-FReLU, GC64-FAMP-FReLU, 
GC128-FAMP-FReLU, LIN256, LIN6890). Zeroth FAUST 
shape containing N' = 6890 vertices was used as reference; 
for each point on the query shape, the output of GCNN repre¬ 
senting the soft correspondence as an 6890-dimensional vec¬ 
tor was converted into a point correspondence by taking the 
maximum. Training was done by minimizing the loss (16); 
training and test sets were as in the previous experiment. 
Figure 6 shows the performance of our method evaluated 
using the Princeton benchmark, and Figure 7 shows corre¬ 
spondence examples where colors are transferred using raw 
point-wise correspondence in input to the functional maps 
algorithm. GCNN shows significantly better performance 
than previous methods [22, 35, 37]. 

7.3. Shape retrieval 

In our final experiment, we performed pose-invariant 
shape retrieval on the FAUST dataset. This is a hard fine¬ 
grained classification problem since some of the human 



GCNN 


Figure 4: Normalized Euclidean distance between the descriptor at a reference point on the shoulder (white sphere) and the 
descriptors computed at the rest of the points for different transformations (shown left-to-right: near isometric deformations, 
non-isometric deformations, topological noise, geometric noise, uniform/non-uniform subsampling, missing parts). Cold and 
hot colors represent small and large distances, respectively; distances are saturated at the median value. Ideal descriptors would 
produce a distance map with a sharp minimum at the corresponding point and no spurious local minima at other locations. 



Figure 6: Performance of shape correspondence on the 
FAUST dataset evaluated using the Princeton benchmark. 
Higher curve corresponds to better performance. 

subjects look nearly identical. We used a GCNN archi¬ 
tecture with one convolutional layer (input: 16-dimensional 
HKS descriptors, LINS, GCS-i-AMP, COV), producing a 
64-dimensional output used as the global shape descriptor. 


Training set consisted of five poses per subject (a total of 50 
shapes); testing was performed on the 50 remaining shapes 
in a leave-one-out fashion. Evaluation was done in terms 
of precision (percentage of retrieved shapes matching the 
query class) and recall (percentage of shapes from the query 
class that is retrieved). Figure 8 shows the PR curve. For 
comparison, we show the performance of other descriptors 
(HKS, WKS, and OSD) aggregated into a global covariance 
shape descriptor. GCNN outperforms significantly all other 
methods. 

8. Conclusions 

We presented GCNN, a generalization of CNNs allowing 
to learn hierarchical task-specific features on non-Euclidean 
manifolds for applications such as shape correspondence 
or retrieval. Our model is very generic and flexible, and 
can be made arbitrarily complex by stacking multiple lay¬ 
ers. Applying GCNN on other shape representations such 
as point clouds could be achieved by modifying the local 
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Figure 5: Performance of different descriptors measured using the CMC (left), ROC (center) and Princeton protocol for 
nearest-neighbor correspondence (right); higher curves correspond to better performance. First row show results for GCNN 
trained and tested on disjoint sets of the FAUST dataset. Second row shows results for a transfer learning experiment where 
the net has been trained on FAUST and applied to the TOSCA test set. GCNN (red and blue curves) significantly outperforms 
other standard descriptors. 






Figure 7: Example of correspondence obtained with GCNN 
(bottom) and random forest (top). Similar colors encode 
corresponding points. 


geodesic charting procedure. Though in this paper we used 
intrinsic spectral properties of the shape as the the input to 
the network, GCNN can be applied on any function defined 
on the manifold, and it would be particularly natural to use 
it to construct descriptors of textured surfaces. 



Figure 8: Performance (in terms of Precision-Recall) of 
shape retrieval on the FAUST dataset using different descrip¬ 
tors. Higher curve corresponds to better performance. 
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